36 research outputs found

    Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model

    Full text link
    In the coming years and decades, autonomous vehicles (AVs) will become increasingly prevalent, offering new opportunities for safer and more convenient travel and potentially smarter traffic control methods exploiting automation and connectivity. Car following is a prime function in autonomous driving. Car following based on reinforcement learning has received attention in recent years with the goal of learning and achieving performance levels comparable to humans. However, most existing RL methods model car following as a unilateral problem, sensing only the vehicle ahead. Recent literature, however, Wang and Horn [16] has shown that bilateral car following that considers the vehicle ahead and the vehicle behind exhibits better system stability. In this paper we hypothesize that this bilateral car following can be learned using RL, while learning other goals such as efficiency maximisation, jerk minimization, and safety rewards leading to a learned model that outperforms human driving. We propose and introduce a Deep Reinforcement Learning (DRL) framework for car following control by integrating bilateral information into both state and reward function based on the bilateral control model (BCM) for car following control. Furthermore, we use a decentralized multi-agent reinforcement learning framework to generate the corresponding control action for each agent. Our simulation results demonstrate that our learned policy is better than the human driving policy in terms of (a) inter-vehicle headways, (b) average speed, (c) jerk, (d) Time to Collision (TTC) and (e) string stability

    Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

    Full text link
    Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. On the standard D4RL continuous control tasks, we find that our method significantly outperforms previous model-based approaches: e.g., MOPO by 116.4116.4%, MOReL by 23.223.2% and COMBO by 23.723.7%. Further, CBOP achieves state-of-the-art performance on 1111 out of 1818 benchmark datasets while doing on par on the remaining datasets

    A Neuro-Genetic-Based Universally Transferable Freeway Incident Detection Framework

    No full text
    A universal freeway incident detection framework is a task that remains unfulfilled despite the promising approaches that have been recently explored. The need for an operationally successful incident detection and management system as a vital component of any advanced traffic management system, is well established and recognized. Only recently however, researchers and practitioners have begun to increasingly realize that for an incident detection framework to be universally operational and successful, it needs to fulfill all components of a set of recognized needs. It is the objective of this research to define those universality requirements and produce an incident detection framework that possesses the potential to fulfill them.A new potentially universal freeway incident detection framework has been proposed, developed and evaluated. The research effort was started by defining a comprehensive set of requirements that any universal incident detection algorithm or framework should fulfill. Among these requirements, an incident detection algorithm needs to be operationally accurate, automatically transferable, and capable of automatically adapting to changes in the freeway environment. This set of universality requirements was used as a template against which all algorithms within the scope of this study have been evaluated. Three major incident and loop detector databases were heavily utilized, two of which are unprecedented real databases collected from two major freeway sites in California and Minnesota, namely the Alameda County's I-880 freeway database and the Minneapolis' I-35W database. The universality of the most well known existing incident detection algorithms was tested using the above databases. Serious lack of the universality, particularly transferability, was detected in all existing algorithms. Prior to the development of the new universal framework, limits on acceptable performance were elicited from TMC surveys conducted as part of this effort. Preliminary investigation of two promising advanced neural networks, namely the LOGICON and the PNN, was conducted. The PNN was more appealing due to its universality potential. The PNN was modified using a principal components transformation layer that resulted in performance enhancements. This together with its potential universality, led to the choice of the modified PNN for in-depth development. The in-depth development stage was divided into three phases. The first was the extraction of a new and improved input feature set that produced more distinct classes in the input feature space. The new features enhanced the transferability of the PNN and made the framework more compliant with the universality requirements. The second phase was the on-site real time retraining of the PNN after transferability, a phase that produced near optimal classification results and detection performance. The third phase was the development of a post processor output interpreter that linked the isolated 30 second outputs of the PNN and produced a sequentially updated probabilistic measure of existence of an incident in the field. The overall PNN-based framework was found to be fully compliant with the entire set of universality requirements. Finally, a new approach for training a multi-smoothing-parameter version of the PNN was investigated. The approach utilized genetic algorithms for optimizing the selection of the smoothing parameters. Obtained results indicated an improvement in performance over the single smoothing parameter PNN but at the expense of longer training time.The superiority and universality of a particular advanced neural network model, namely the PNN, was concluded in this research, as compared to the Logicon and the MLF neural networks, as well as existing conventional freeway incident detection algorithms. Adding the principal components transformation layer to the PNN was found to enhance its performance. Although the genetically optimized version of the PNN showed better transferability, both versions showed equally good performance after retraining. The PNN was concluded to be more practical for TMC implementation due to its instantaneous training capabilities

    Automated Adaptive Traffic Corridor Control Using Reinforcement Learning: Approach and Case Studies

    No full text
    Advancements in intelligent transportation systems and communication technology could considerably reduce delay and congestion through an array of networkwide traffic control and management strategies. The two most promising control tools for freeway corridors are traffic-responsive ramp metering and dynamic traffic diversion using variable message signs (VMSs). The use of these control methods independently could limit their usefulness. Therefore, integrated corridor control by using ramp metering and VMS diversion simultaneously could be beneficial. Administration of freeways and adjacent arterials often falls under different jurisdictional authorities. Lack of coordination among those authorities caused by lack of means for information exchange or “institutional gridlock” could hinder the full potential of technically possible integrated control. Fully automating corridor control could alleviate this problem. Research was conducted to develop a self-learning adaptive integrated freeway–arterial corridor control for both recurring and nonrecurring congestion. Reinforcement learning, an artificial intelligence method for machine learning, is used to provide a single, multiple, or integrated optimal control agent for a freeway or freeway–arterial corridor for both recurrent and nonrecurrent congestion. The microsimulation tool Paramics, which has been used to train and evaluate the agent in an offline mode within a simulated environment, is described. Results from various simulation case studies in the Toronto, Canada, area are encouraging and have demonstrated the effectiveness and superiority of the technique

    Multi-Agent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC)

    No full text
    Traffic congestion in Greater Toronto Area costs Canada 6billion/yearandisexpectedtogrowupto 6 billion /year and is expected to grow up to 15 billion /year in the next few decades. Adaptive Traffic Signal Control(ATSC) is a promising technique to alleviate traffic congestion. For medium-large transportation networks, coordinated ATSC is becoming a challenging problem because the number of system states and actions grows exponentially as the number of networked intersections grows. Efficient and robust controllers can be designed using a multi-agent reinforcement learning (MARL) approach in which each controller (agent) is responsible for the control of traffic lights around a single traffic junction. This paper presents a novel, decentralized and coordinated adaptive real-time traffic signal control system using Multi-Agent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLINATSC) that aims to minimize the total vehicle delay in the traffic network. The system is tested using microscopic traffic simulation software (PARAMICS) on a network of 5 signalized intersections in Downtown Toronto. The performance of MARLIN-ATSC is compared against two approaches: the conventional pretimed signal control (B1) and independent RL-based control agents (B2), i.e. with no coordination. The results show that network-wide average delay savings range from 32% to 63% relative to B1 and from 7% to 12% relative to B2 under different demand levels and arrival profiles

    Development Testing And Evaluation Of Advanced Techniques For Freeway Incident Detection

    No full text
    In this research, the authors introduce and define a universal incident detection framework that is capable of fulfilling all components of a set of recognized needs. An algorithm is presented that has the potential to fulfill the defined universality requirements. It is a modified form of a probabilistic neural network (PNN) that utilizes the concept of statistical distance. The first part of the report presents a definition of the attributes and capabilities that a potentially universal freeway incident detection framework should possess. The second part discusses the training and testing of the PNN. The third section evaluates the PNN relative to the proposed universality template. In addition to a large set of simulated incidents, the authors utilize a large real incident database from the I-880 freeway in California to comparatively evaluate the performance and transferability of different algorithms including the PNN.Bayesian statistical decision theory, Neural networks (Computer science), Automatic incident detection

    Assessment of streetcar transit priority options using microsimulation modelling

    No full text
    The thrust of a recently published transportation vision for Toronto is focused largely on reducing automobile dependence via a number of interacting strategies, including the wide application of transit priority policies to improve transit competitiveness. This paper reports on quantifying the impacts of several transit priority schemes, with the streetcar operation along King Street in the heart of Toronto as a case study. Four scenarios were modelled in a micro simulation framework. They include the status quo (involving unconditional transit signal priority, already in operation), turning off existing transit signal priority, prohibiting all left turns, and finally prohibiting traffic from King Street. To quantify the impacts of any of the above scenarios, a set of common measures of effectiveness was used, including transit travel time and speed, effective headway, service frequency and person throughput, bunching, fleet size implications, and overall traffic and transit average speeds. The results show the relative merits of the four scenarios and two strategies for improving streetcar service along the King route are recommended. The first is to prohibit all left turns along the route, while the second, admittedly more aggressive, is to potentially transform the arterial into a transit mall accessible only to streetcars.Key words: transit signal priority, transit priority, signal control, microsimulation, streetcars
    corecore